An Approach to State Aggregation for POMDPs
نویسندگان
چکیده
A partially observable Markov decision process (POMDP) provides an elegant model for problems of planning under uncertainty. Solving POMDPs is very computationally challenging, however, and improving the scalability of POMDP algorithms is an important research problem. One way to reduce the computational complexity of planning using POMDPs is by using state aggregation to reduce the (effective) size of the state space. State aggregation techniques that rely on a factored representation of a POMDP have been developed in previous work. In this paper, we describe similar techniques that do not rely on a factored representation. These techniques are simpler to implement and make this approach to reducing the complexity of POMDPs more general. We describe state aggregation techniques that allow both exact and approximate solution of non-factored POMDPs and demonstrate their effectiveness on a range of benchmark problems.
منابع مشابه
Solving POMDPs by Searching in Policy Space
Most algorithms for solving POMDPs itera tively improve a value function that implic itly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that repre sents a policy explicitly as a finite-state con troller and iteratively improves the controller by search in policy space. Two related al gorithms illustrate this approach. ...
متن کاملDeep Reinforcement Learning with POMDPs
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...
متن کاملHeuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs
Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. ...
متن کاملPEGASUS: A policy search method for large MDPs and POMDPs
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an “equivalent” POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the...
متن کاملEfficient Approximate Value Iteration for Continuous Gaussian POMDPs
We introduce a highly efficient method for solving continuous partially-observable Markov decision processes (POMDPs) in which beliefs can be modeled using Gaussian distributions over the state space. Our method enables fast solutions to sequential decision making under uncertainty for a variety of problems involving noisy or incomplete observations and stochastic actions. We present an efficie...
متن کامل